Search CORE

64 research outputs found

BORPH: operating system support on the NetFPGA platform

Author: Hamilton BK
So HKH
Publication venue
Publication date: 01/01/2010
Field of study

This paper introduces the concepts behind BORPH, an operating system for reconfigurable computers. The porting and implementation of this operating system for the NetFPGA platform, as well as the tool flow integration are described.postprintThe 2nd North American NetFPGA Developers Workshop 2010, Stanford, CA., 12-13 August 2010

HKU Scholars Hub

Architecture for quadruple precision floating point division with multi-precision support

Author: Jaiswal MK
So HKH
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2016
Field of study

postprin

Crossref

HKU Scholars Hub

A unified hardware/software runtime environment for FPGA-based reconfigurable computers using BORPH

Author: Brodersen R
So HKH
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2008
Field of study

Fulltext linkThis paper explores the design and implementation of BORPH, an operating system designed for FPGA-based reconfigurable computers. Hardware designs execute as normal UNIX processes under BORPH, having access to standard OS services, such as file system support. Hardware and software components of user designs may, therefore, run as communicating processes within BORPH's runtime environment. The familiar language independent UNIX kernel interface facilitates easy design reuse and rapid application development. To develop hardware designs, a Simulink-based design flow that integrates with BORPH is employed. Performances of BORPH on two on-chip systems implemented on a BEE2 platform are compared. © 2008 ACM.link_to_subscribed_fulltex

HKU Scholars Hub

Dynamic power reduction of FPGA-based reconfigurable computers using precomputation

Author: So HKH
Tsang CC
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2010
Field of study

This paper examines the effectiveness of employing pre-computation techniques to reduce power consumption of field configurable computing systems. Multiplier is modified with pre-computation techniques and are implemented using commercial off-the-shelf FPGAs. Pre-computation techniques reduce dynamic power consumption of a module by eliminating unnecessary signal switching activities in inactive portions of the modules. Experiments have shown that up to 52% of logic and signal power consumption can be reduced in multiplier module. Furthermore, when compared to ASIC implementations, FPGA implementations of pre-computation modules have the advantage of lower area overhead as most of them can be implemented using originally unoccupied related FPGA resources. Finally, it was found that the effectiveness of pre-computation depends heavily on the input data statistics. It is expected that compilers for future reconfigurable computers may take full advantage of such power saving techniques by optimizing the architecture according to data input statistics.postprintThe 1st International Workshop on Highly Efficient Accelerators and Reconfigurable Technologies (HEART), Tsukuba, Japan, 1 June 2010. In ACM SIGARCH Computer Architecture News, 2010, v. 38 n. 4, p. 87-9

HKU Scholars Hub

A soft processor overlay with tightly-coupled FPGA accelerator

Author: Liu C
Ng HC
So HKH
Publication venue
Publication date: 01/01/2016
Field of study

FPGA overlays are commonly implemented as coarse-grained reconfigurable architectures with a goal to improve designers’ productivity through balancing flexibility and ease of configuration of the underlying fabric. To truly facilitate full application acceleration, it is often necessary to also include a highly efficient processor that integrates and collaborates with the accelerators while maintaining the benefits of being implemented within the same overlay framework. This paper presents an open-source soft processor that is designed to tightly-couple with FPGA accelerators as part of an overlay framework. RISC-V is chosen as the instruction set for its openness and portability, and the soft processor is designed as a 4-stage pipeline to balance resource consumption and performance when implemented on FPGAs. The processor is generically implemented so as to promote design portability and compatibility across different FPGA platforms. Experimental results show that integrated software-hardware applications using the proposed tightly-coupled architecture achieve comparable performance as hardware-only accelerators while the proposed architecture provides additional run-time flexibility. The processor has been synthesized to both low-end and high-performance FPGA families from different vendors, achieving the highest frequency of 268:67MHz and resource consumption comparable to existing RISC-V designs.postprin

HKU Scholars Hub

Mixed-architecture process scheduling on tightly coupled reconfigurable computers

Author: Hamilton BK
Inggs M
So HKH
Publication venue
Publication date: 01/01/2014
Field of study

The design and implementation of a multitasking runtime system for mixed-architecture applications on a tightly coupled FPGA-CPU platform is presented. The runtime environment and the user applications assume an underlying machine that encompasses multiple computing architectures within a unified machine model. Using this model, a unified process scheduling mechanism was developed that enables concurrent execution of multiple mixed-architecture processes. Scheduling and allocation strategies, including blocking and preemption, were implemented and evaluated with respect to performance and fairness on a Xilinx Zynq platform using a mix of synthetic workloads.postprin

Crossref

HKU Scholars Hub

A soft coarse-grained reconfigurable array based high-level synthesis methodology: Promoting design productivity and exploring extreme FPGA frequency

Author: Lin CY
Liu C
So HKH
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Compared to the use of a typical software development flow, the productivity of developing FPGA-based compute applications remains much lower. Although the use of high-level synthesis (HLS) tools may partly alleviate this shortcoming, the lengthy low-level FPGA implementation process remains a major obstacle to high productivity computing, limiting the number of compile-debug-edit cycles per day. Furthermore, high-level application developers often lack the intimate hardware engineering experience that is needed to achieve high performance on FPGAs, therefore undermining their usefulness as accelerators. To address the productivity and performance problems, a HLS methodology that utilizes soft coarse-grained reconfigurable arrays (SCGRAs) as an intermediate compilation step is presented. Instead of compiling high-level applications directly to circuits, the compilation process is reduced to an operation scheduling task targeting the SCGRA. © 2013 IEEE.published_or_final_versio

HKU Scholars Hub

Architecture for dual-mode quadruple precision floating point adder

Author: Bogaraju SV
Jaiswal MK
So HKH
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2015
Field of study

This paper presents a configurable dual-mode architecture for floating point (F.P.) adder. The architecture (named as QPdDP) works in dual-mode which can operates either for quadruple precision or dual (two-parallel) double precision. The architecture follows the standard state-of-the-art flow for floating point adder. It is aimed for the computation of normal as well as sub-normal operands, along with the support for the exceptional case handling. The key sub-components in the architecture are re-designed & optimized for on-the-fly dual-mode processing, which enables efficient resource sharing for dual precision operands. The data-path is optimized for minimal multiplexing circuitry overhead. The presented dual- mode architecture provide SIMD support for double precision operands, along with high (quadruple) precision support. The proposed architecture is synthesized using UMC 90nm technology ASIC implementation. It is compared with the best available literature works, and have shown better design metrics in terms of area, period and area × period, along with more computational support.published_or_final_versio

HKU Scholars Hub

Direct Virtual Memory Access from FPGA for High-Productivity Heterogeneous Computing

Author: Choi YM
Ng HC
So HKH
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2013
Field of study

Heterogeneous computing utilizing both CPU and FPGA requires access to data in the main memory from both devices. While a typical system relies on software executing on the CPU to orchestrate all data movements between the FPGA and the main memory, our demo presents a complementary FPGA-centric approach that allows gateware to directly access the virtual memory space as part of the executing process without involving the CPU. A caching address translation buffer was implemented alongside the user FPGA gateware to provide runtime mapping between virtual and physical memory addresses. The system was implemented on a commercial off-the-shelf FPGA add-on card to demonstrate the viability of such approach in low-cost systems. Experiment demonstrated reasonable performance improvement when compared to a typical software-centric implementation; while the number of context switches between FPGA and CPU in both kernel and user mode was significantly reduced, freeing the CPU for other concurrent user tasks. © 2013 IEEE.published_or_final_versio

HKU Scholars Hub

A model for peak matrix performance on FPGAs

Author: Leong PHW
Lin CY
So HKH
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2011
Field of study

Computations involving matrices form the kernel of a large spectrum of computationally demanding applications for which FPGAs have actively been utilized as accelerators. The performances of such matrix operations on FPGAs are related to underlying architectural parameters such as computational resources, memory and I/O bandwidth. A model that gives bounds on the peak performance of matrix-vector and matrix-matrix multiplication operations on FPGAs based on these parameters is presented. The architecture and efficiency of existing implementations are compared against the model. Future trends in matrix performance on FPGA devices are estimated based on the performance model and system parameters from the past decade. © 2011 IEEE.published_or_final_versio

HKU Scholars Hub